Fast Warping Distance for Sparse Time Series
نویسندگان
چکیده
Dynamic Time Warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, DTW, in its original formulation, is extremely inefficient in comparing long sparse time series, which mostly contain zeros and unevenly spaced non-zero observations. Original DTW distance does not take advantage of the sparsity, and thus, incur a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of applications include bot detection, human activity classification, and unusual review pattern mining.
منابع مشابه
A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملAccurate Time Series Classification Using Partial Dynamic Time Warping
Dynamic Time Warping (DTW) has been widely used in time series domain as a distance function for similarity search. Several works have utilized DTW to improve the classification accuracy as it can deal with local time shiftings in time series data by non-linear warping. However, some types of time series data do have several segments that one segment should not be compared to others even though...
متن کاملSpider Algorithm for Clustering Time Series
In proportion to the rapid development of information technology, time series are today accumulated in finance, medicine, industry and so forth. Therefore, an analysis of them is an urgent need for these applications. As solving these problems clustering time series has much been paid attention. The similarity for the clustering is commonly measured with Euclidean distance and dynamic time warp...
متن کاملA New IRIS Segmentation Method Based on Sparse Representation
Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...
متن کاملA New IRIS Segmentation Method Based on Sparse Representation
Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...
متن کامل